Skip to content

[NFC] cache repeated tree walks to avoid O(N^2) in optimizeTerminatingTails in CodeFolding#8602

Open
Changqing-JING wants to merge 7 commits intoWebAssembly:mainfrom
Changqing-JING:opt/compile-speed3
Open

[NFC] cache repeated tree walks to avoid O(N^2) in optimizeTerminatingTails in CodeFolding#8602
Changqing-JING wants to merge 7 commits intoWebAssembly:mainfrom
Changqing-JING:opt/compile-speed3

Conversation

@Changqing-JING
Copy link
Copy Markdown
Contributor

@Changqing-JING Changqing-JING commented Apr 14, 2026

Cache the result of getBranchTargets(getFunction()->body) in optimizeTerminatingTails so that recursive calls share the same computed set rather than each re-walking the entire function body. This avoids O(N²) behavior where N is the size of the function body, since the recursive calls previously each performed an O(N) tree walk. The cached targets are computed lazily on first need and passed through to the canMove overload that accepts pre-computed branch targets.

Benmark data

For the test case in #7319 (comment)
Main head:

time ./build/bin/wasm-opt --code-folding --enable-bulk-memory --enable-multivalue --enable-reference-types --enable-gc --enable-tail-call --enable-exception-handling -o /dev/null ./test3.wasm

real    5m45.996s
user    6m6.267s
sys     0m3.798s

This PR:

time ./build/bin/wasm-opt --code-folding --enable-bulk-memory --enable-multivalue --enable-reference-types --enable-gc --enable-tail-call --enable-exception-handling -o /dev/null ./test3.wasm

real    2m2.380s
user    2m25.700s
sys     0m2.449s

Benchmark regression test

Test case: https://jetbrains.github.io/kotlinconf-app/73cbe24d7cf5a54d37ad.wasm
On main

Performance counter stats for 'build/bin/wasm-opt 73cbe24d7cf5a54d37ad.wasm -all --code-folding -o /dev/null' (10 runs):

        4837936912      task-clock                       #    1.445 CPUs utilized               ( +-  0.51% )
               114      context-switches                 #   23.564 /sec                        ( +-  7.58% )
                 7      cpu-migrations                   #    1.447 /sec                        ( +- 16.88% )
             46271      page-faults                      #    9.564 K/sec                       ( +-  0.00% )
       13431328103      instructions                     #    1.21  insn per cycle              ( +-  0.01% )
       11125222873      cycles                           #    2.300 GHz                         ( +-  0.51% )
          64641504      branch-misses                                                           ( +-  1.26% )

            3.3484 +- 0.0221 seconds time elapsed  ( +-  0.66% )

On current PR

 Performance counter stats for 'build/bin/wasm-opt 73cbe24d7cf5a54d37ad.wasm -all --code-folding -o /dev/null' (10 runs):

        4802304211      task-clock                       #    1.437 CPUs utilized               ( +-  0.47% )
               125      context-switches                 #   26.029 /sec                        ( +-  6.50% )
                 8      cpu-migrations                   #    1.666 /sec                        ( +- 14.20% )
             46272      page-faults                      #    9.635 K/sec                       ( +-  0.00% )
       13391520427      instructions                     #    1.21  insn per cycle              ( +-  0.01% )
       11043221889      cycles                           #    2.300 GHz                         ( +-  0.47% )
          59021679      branch-misses                                                           ( +-  1.24% )

            3.3427 +- 0.0207 seconds time elapsed  ( +-  0.62% )

@Changqing-JING Changqing-JING requested a review from a team as a code owner April 14, 2026 09:16
@Changqing-JING Changqing-JING requested review from tlively and removed request for a team April 14, 2026 09:16
@Changqing-JING Changqing-JING marked this pull request as draft April 14, 2026 09:16
@Changqing-JING Changqing-JING marked this pull request as ready for review April 15, 2026 14:02
@Changqing-JING Changqing-JING marked this pull request as draft April 15, 2026 14:23
@Changqing-JING Changqing-JING force-pushed the opt/compile-speed3 branch 2 times, most recently from c2c710f to 82d92bb Compare April 16, 2026 03:07
@Changqing-JING Changqing-JING changed the title perf: cache repeated tree walks to avoid O(N^2) in optimizeTerminatingTails in CodeFolding [NFC] cache repeated tree walks to avoid O(N^2) in optimizeTerminatingTails in CodeFolding Apr 16, 2026
@Changqing-JING Changqing-JING marked this pull request as ready for review April 16, 2026 05:01
@Changqing-JING Changqing-JING marked this pull request as draft May 7, 2026 02:42
@Changqing-JING Changqing-JING marked this pull request as ready for review May 7, 2026 09:51
@Changqing-JING
Copy link
Copy Markdown
Contributor Author

@kripken
Could you help to review this PR?

Copy link
Copy Markdown
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT with some extra comments.

Comment thread src/passes/CodeFolding.cpp
@Changqing-JING Changqing-JING requested a review from tlively May 8, 2026 02:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants